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THERMODYNAMICS OF THE BINARY SYMMETRIC CHANNEL 


EVGENY VERBITSKIY 

Abstract. We study a hidden Markov process which is the result of a transmission 
of the binary symmetric Markov source over the memoryless binary symmetric channel. 
This process has been studied extensively in Information Theory and is often used as a 
benchmark case for the so-called denoising algorithms. Exploiting the link between this 
process and the ID Random Field Ising Model (RFIM), we are able to identify the Gibbs 
potential of the resulting Hidden Markov process. Moreover, we obtain a stronger bound 
on the memory decay rate. We conclude with a discussion on implications of our results 
for the development of denoising algorithms. 


1. Introduction 

We study the binary symmetric Markov source over the memoryless binary symmetric 
channel. More specihcally, let {Y„} be a stationary two-state Markov chain with values 
{±1}, and 

P(x„+1 7^ Xn) = p, 

where 0 < p < 1. The binary symmetric channel will be modelled as a sequence of Bernoulli 
random variables {Zn} with 

^z{Zn = —1) = £, Pz(-^n = 1) = 1 — 

Finally, put 

( 1 - 1 ) Yn = Xn ■ Zn 

for all n. The process {Y„} is a hidden Markov process, because Y„ G {—1,1} is chosen 
independently for any n from an emission distribution on {—1,1}: tti = (e:, 1 — e) and 
71-1 = (1 - ex)- 

The law Q of the process {Y„} is the push-forward of P x under 'ijj : {—1,1}^ x 
{—1,1}^ HA {—1,1}^, with 'ipiXxn, Zn)) = Xn ' We write Q = (P x P^) o For every 
m < n, and := {ym, • • •, Vn) ^ xpg measure of the corresponding cylindric 
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set is given by 


( 1 . 2 ) 


) • yrti'i • • • 7 Vn) 

Y, n<,Wz(zi 

a;" ,2^e{-l,!}"-’"+! k=m 

^ n—1 

Y i n - £) 

:^g{ — i=m 


l[yk = Xk ■ Zk] 




2. Random Field Ising Model 

It was observed in [12] that the probability Q{ym, ■ ■ ■ ,yn) of a cylindric event {Ym = 
ym, ■ ■ ■ ,Yn = yn}, m < u, can be expressed via a partition function of a random held Ising 
model. We exploit this observation further. Assume p > 0 and e > 0, and put 

J = ilogi^, if = ilogi^. 

2 p 2 e 

Then for any [ym, ■ ■ ■, y-n) G expression for the cylinder probability (1.2) can 

be rewritten as 


QiVm, ...,yn) = 


Cj 


E 


^n—m+1 


n—1 n 

exp ( J ^ XiXi+i + Xiy, 


where 


cj = cosh(J), Xj^K = 2 (cosh(J + A') + cosh(J — A')) = 4 cosh(J) cosh(A'). 

The non-trivial part of the cylinder probability is the sum over all hidden conhgurations 

{Xrri) ■ ■ ■ ) Xn) ■ 

n—1 n 

^n,m{yn) ■= X] OXp J ^ XiXi+i + A' ^ Xiyi 

is in fact the partition function of the Ising model with the random held given by p’s. Ap¬ 
plying the recursive method of [9], the partition function can be evaluated in the following 
fashion [1]. Consider the following functions 

^ , 1, cosh(t(; J) 

A{w) = xlog—r)- 

2 cosh(t(; — J) 

B{w) = - log 4 ■ cosh(t(; -|- J)cosh.{w — J) = - log 


^2w ^ ^-2w ^ g2J g-2J 


One readily checks that if s = ±1, then for all tc G M 

(2.1) exp(^sA{w) + B{w)^ = 2 cosh.{w + sJ). 
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Now the partiton function can be evaluated by summing the right-most spin. Namely, 
suppose m < n, I/” G {—1, then 


~^m,n 


{yl) 


n—2 n—1 



exp ( 

dE 


+ K^ Xiyij 


l+Kyn) 


n 

i=m 


i=m 



E 

exp ( 

n—2 

:^E 


n—1 

112 cosh( Ja;„_i -|- 

Kyn)] 


n 

i=m 


i=m 



E 

exp ( 

n—2 

dE 


n—1 

+ K^^ Xiy^ 

1 exY>{xn-iA{w^^'^' 

) -d B{w. 


n 

i=m 


i=m 




where 

=Kyn 

Hence, 


n—2 

exp f J ^ XiXi+i + K ^ XiUi Xn-i [Kyn-i -h A{w. 


n—2 




(n 


(») 

^n-1 


X expfH(u;W) 


and thus the new sum has exactly the same form, but instead of Ky^-i, we now have 
= Kyn-i + Continuing the summation over the remaining right-most x- 

spins, one gets 

n 

Z,n,n(0 = 2cosh(w^))exp(^ ^ 5(4”^)), 

i=m-\-l 


where 

= Kyi + A{w^^^) 

equivalently, since A(0) = 0, we can dehne 


for every i < n, 


=0 Vi > n, and = Kyi + A{w^l^^) Vi < n. 


Therefore, we obtain the following expressions for the cylinder and conditional probabilities 


Q(i/o) 

Q(i/o|i/") 


Cj 

\ n+l 
^J,K 


cosh(t(;Q”^) exp | 


(r^)^ 


J=1 


I cosh (wg"'^) exp 


A 


J,K 


cosh(t(;i‘ 


(«)^ 


( 2 . 2 ) 
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3. Thermodynamic formalism 


Let where T is a finite alphabet, be the space of one-sided infinite sequences 

= (wo, ivi,...) in alphabet T ( Wj G T for all i). We equip hi with the metric 

where lj) = 1 if cuo 7 ^ cDo, and A;(a;, Cj) = max{A; G N : a;* = a;* Vi = 0,..., /c — 1}, 
otherwise. Denote by 5* : D —)■ D the left shift: 

{Suj)i = cjj+i for all i E 1>+. 

Borel probability measure P is translation invariant if 

F{S-^C) = P(C') 

for any Borel event CCD. 

Let us recall the following well-known definitions: 

Definition 3.1. Suppose P is a fully supported translation invariant measure on D = T^+, 
where T is a finite alphabet. 

(i) The measure P is called a g-measure, if for some positive continuous function 
5 ^ : D —)■ (0,1) satisfying the normalization condition 

(^(Do, cui, U2,...) = I 

UJq^-A 

for all oj = (cjo, ...) G D, one has 

P(a;o|l^l, CJ2, ■ ■ ■) = ■ ■ •) 

for P-a.a. a; G D. 

(ii) The measure P is Bowen-Gibbs for a continuous potential 0 : D —)■ M, if there exist 
constants P G M and C >1 such that for all ca G D and every n G N 

1 ^ P({d; G D : (ho = 1^0, • • • djn-l = OJn-l}) ^ ^ 

C~ exp((S'„0)(u;) - nP) “ 

where (Pn0)(^) = 

(hi) The measure P is called an equilibrium state for continuous potential 0 : D —)■ M, 
if P attains maximum of the following functional 


(3.1) 


ft(P) + 


(j)dF 


sup 

PgA4*(0) 


h(P) + 




where h(P) is the Kolmogorov-Sinai entropy of P and the supremum is taken over the set 
Ali(D) of all translation invariant Borel probability measures on D. 


It is known that every gf-measure P is also an equilibrium state for \ogg-, and that every 
Bowen-Gibbs measure P for potential 0 is an equilibrium state for 0 as well. 
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Theorem 3.1. The measure Q on { — 1, (c.f., (2.2)j is a g-measure for some positive 
continuous function g with an exponential decay of variation; 

(3.2) var„(^) := sup \giy) - giy)\< Cp^, 

y,y-yo~^=yo~^ 

where C > 0 and p G (0,1). The measure Q is also a Bowen-Gibbs measure for a Holder 
continuous potential (f) : {—1,t M. 

The result of Theorem 3.1 is actually true in much greater generality: namely, for 
distributions of Hidden Markov Chains where the underlying Markov chain {X„} 

has strictly positive transition probability matrix P, see [13] for review of several results 
of this nature. However, the present situation is rather exceptional since one is able to 
identify the gf-function and the Gibbs potential 0 explicitly. Another interesting question 
is the estimate of the decay rate p. In [13] a number of previously known estimates of the 
rate of exponential decay in (3.2) have been compared; the best known estimate for p 

p < |1 — 2p| 

is due to [7] and [6]. Quite surprisingly the estimate does not depend on e, and in fact, 
it was conjectured in [13] that the estimate could be improved, e.g., by incorporating 
dependence on £. The proof of Theorem 3.1 shows that this is indeed the case and one 
obtains a new estimate 

P < P*(p,£) < |1 - 2p|. 

We start with the following technical result. 

Lemma 3.2. Fix y = (i/o, i/i,...) G { — 1,1}^+. For every n G define the sequence 

i G by letting = 0 for every i > n + 1 and = Kyi + 
for i < n. Then for every i G 

lim =: Wi{y). 

n—yoo 

Moreover, there exist constants g G (0,1) and C > 0, both independent of y, such that 

(3.3) ld”'(y)-u>.(j/)|<Ci)’- 

for all n > i, and therefore, Wi : {—1,1}^+ —)■ M is Holder continuous for every i G Z+; 

\wi{y) - Wi{y)\ = \wo{S"y) - wo{S'y)\ < C {d{S"y, S"y)y 
for some C',9 > 0 and all y,y G {—1,1}^+. 

Proof. Suppose i < n < m. Then 

- wt >\ = |3l(u><:>) - < \ wP , - u>'™>| . sup|^ . 

dA sinh(2J) 

dw cosh(2J) + cosh(2t(;) 


One has 
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and hence 
(3.4) 


g := snp 

w 


dA 


dw 



sinh(2J) 


cosh(2J) + 1 


= I tanh(J)| = |1 — 2 j9| < 1. 


Combined with the fact that for all i G 

= \Kyi + A{w^^l)\ < \K\ + |arctanh(l - 2p)\ < \K\ + \ J\ =: Ci. 
Therefore for i < n < m 


I in) (m) I 

\w, — w) 


^ Q l^n+1 ^n+ll ~ Q l^n+ll — 






Hence, lim„_,,oo =: Wi exists and 




(m) (m+1) I 

\w) — w) ’\ 


< 




p™-* = 


I - Q 


=; Cq^-\ 


The estimate in (3.4) can be improved. Firstly, assnme that e < p. In this case, \K\ > 
I J| >0, and if i < n, then 

wf 6 [-lA'I - IJ|, -\K\ + IJ|] U [lA'I - IJ|, \K\ + IJ|], 
i.e., is bounded away from 0 (see Figure l.(a)). Therefore, we can define q by 


Q = q{.J,K) = Q{p,e) 

^ g(l-g)|l-2p| 
— 2pe + e 


sup 

«;e[|iC|-|J|,|X| + |J|] 

g(l -g) 

(p - eY + e(l - 


dA 


sinh(2 J) 

dw 


cosh(2J) + cosh(2iF — 2 J) 


—11 — 2p| < |1 — 2p|. 



Figure 1. Graphs of FV(tc) = K + A{w), F_{w) = —K + A{w) for (a) 
£ < p < 0.5 and (b) p < £ < 0.5. 


If £ > p (equivalently, K < J), then the maps F^{w) = K + A{w) and F_{w) = 
K + A{w) have no longer disjoint images (c.f.. Figure l.(b)) . Nevertheless, one can 
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consider second iterations: 

Id"* - d”"! = D(d:’i) - -4(d:;)l = \A(Ky.^, + ^(dd)) - A(Ky,^, + ^(dS 

< (sup \A'{K + sl(!o))sl'(ui)|) =: p'^’ldd “ dd’ 

w 

One can show that 

(3.5) = snp \ A'{K + A{w))A'{w)\ < (1 — 2 p)^. 

W 

Informally, it is evident that the maximal value of the derivative |dl'(-)|, equal to |1 — 2p|, 
is attained, if tc = 0 or if + A{w) = 0, but then K + A{w) 7 ^ 0 or tc 7 ^ 0, respectively, 
and hence (3.5) holds. Similar argument generalises to all w. hrstly, note that 

(3.6) \A'iK + Aiw))A'iw)\ = -, -^^ ~ ^ ^^^—-, 

(a + (1 — a) cosh( 2 i*^ + 2A{w))) ■ (a + (1 — a) cosh( 2 t(;)) 

where a = (1 — p)^ + I — a = 2 p(l — p). Let A > 0 be such that for all w G [—A, A] 
one has |dl(t(;)| < |A '|/2 and cosh( 2 iL + 2A{w)) > cosh(iL), and hence 

(1 - 2 p )2 


\A\K + A{w))A\w)\ < 
For w ^ [—A, A], one has 

\A'{K + A{w))A\w)\ < 

Hence, 

(1 - 2pf 


p'A) = jYiin 


(a + (1 — a) cosh(A)) ■ 1 

(1 - 2pf 

1 ■ (a + (1 — a) cosh(A)) 

(1 - 2 p )2 


< (1 - 2py. 


< (1 - 2pf 


q; + (1 — tt) cosh(iF)) ’ (a + (1 — a) cosh(A)) 


< (1 - 2 p)^ 


and hence g = \/~p^ < |1 — 2p|. Sharper bounds can be achieved by studying minimum 
of the denominator in (3.6). □ 

Proof of Theorem 3.1. To show that Q is a p-measure it is sufficient to show that condi¬ 
tional probabilities Q(po|i/i) converge uniformly as n —)■ cx). Given that 


(3.7) 


Q{yo\yi) = 


cosh(t(;Q”^) exp 

,(A\ 


^J,K cosh(t(;)" j 

and using the result of Lemma 3.2: wf'{y) ^ Wi{y) as n —)■ 00 , we obtain uniform conver¬ 
gence of conditional probabilities, and hence, Q is a p-measure with g given by 

1 cosh(t(;o(y)) exp ( 5(^1 (?/))) 


(3.8) 


giy) = 


\j cosh {wi{y)) 

Let us introduce the following functions: for y G { — 1,1}^+, put 

0(y) = B{wo{y)), h{y) = cosh{wo{y)) exp {-B{wo{y))). 
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Taking into account that Wi{y) = Wo{Sy), one has 

h{y) 


g{y) = 


^j,K h{Sy) 


Since every gf-measure is also an equilibrium state for loggf, we conclude that 
equilibrium state for 


IS an 


^{y) = <P{y) + ^og h{y) - log h{Sy) - log Xj^k- 

The difference 4>{y) — 4>{y) has a very special form: it is a sum of a so-called coboundary 
(logh{y) — log h{Sy)) and a constant (—logAyi^). Two potentials whose difference is of a 
such form, have identical sets of equilibrium states. The reason is that for any translation 
invariant measure Q' one has 


j i^og h{y) - log h{Sy) - log Aj,i^)dQ' 


— log Xj^K = const. 


Therefore, if Q' achieves maximum in the righthand side of (3.1) for 0, then Q' achieves 
maximum for 0 as well. Thus Q is also an equilibrium state for 


4>{y) = B{wo{y)) 


2 log 


4sinh^(t(;o(2/)) + 


1 

p{l-p )-■ 


Any equilibrium measure for a Holder continuous potential 0 is also a Bowen-Gibbs mea¬ 
sure [3]. In our particular case, direct proof of the Bowen-Gibbs property for Q is straight¬ 
forward. Indeed, using the result of (2.2) and the notation introduced above, for every 
y = (l/o,l/i, • • •) one has 


n 

Q(l/o) = T^exp(^H(uif^(y))) cosh(uiJ”^(y)) 
^J,K \=i 


Cj ■ COs\l{w^^\y)) {n), .. , ... 

exp(H(^o(y))) \y))-B[w^m 

n 

X exp B{wi{y)) - (n -M) log Xj^k^ . 

i=0 


Therefore, for P = logAyx, 

__ 

exp((^n+i0)(?/) - (n + 1)P) 


cj • cosh(w^"^(^)) 

exp (H(uio (?/))) 


i=l 


It only remains to demonstrate that the right hand side is uniformly bounded (both in n 
and y = (i/o, i/i,...)) from below and above by some positive constants G, G, respectively. 
Indeed, since p,e > 0, / = [—\K\ — |J|, \K\ + |J|] is a hnite interval, by the result of the 
previous Lemma, w^^\y) G / for all i and n. Using (3.3), one readily checks that the 
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following choice of constants suffices: 

SUp^^,COsh(lT) 
■^£ _ 


inf^„g/exp(i?(w)) \l-Q^(z[\dw 


C \dB 


< oo, 


C = cj 


inf^e/ cosh(i(;) 
sup^g^exp(5(M;)) 


/ C dB 

exp-sup —— 

\ 1 Q wGi dw 


We complete this section with a curious continued fraction representation of the g- 
function (3.8). 

Proposition 3.3. For every y = {yo,yi, ...) G {—1 ,one has 


2 ^(y) = ai 


where for i>l 

(3.9) g* = (1 - 2p)yi_iyi, a* = 1 + g*, 6 * = 4e(l - e)qi. 

Proof. Using elementary transformations, one can show that for every y = {yo,yi, ...) G 
{— 1 , 1 }^+ one has 

9{y) = ^ exp(.B(wi(^))j 

^3 cosh(t(;i(?/)) V / 


1 1 
2^ 2 


(1 - 2 p)(l - 2 e)i/otanh(wi(y)) 


Since 


tanh(A(t(;)) = tanh( J) tanh(t(;) = (1 — 2p) tanh(t(;) for all w G 
for every i G one has 

tanh(Ki/j) + tanh(A(w^+i)) 

* 1 + tanh(i^i/j) ■ tanh(74(t(;j+i)) 

_ (1 - 2e)yi + (1 - 2p) tanh(wi+i) 

1 + (1 - 2e)(l - 2p)i/itanh(t(;j+i) 

_ (1 - 2e) + (1 - 2p)yitanh.{wi+i) 

1 + (1 - 2e) (1 - 2p)yi tanh(t(;i+i) ’ 

Therefore, if we let Zi = {1 — 2p)(l — 2e)i/j_i tanh(t(;j), i G N, then 

= (1 - 2p)y._,y. - fe(l-^)(l-2riy.-.y. ^ ^ 

1 + ^i+1 1 + ^i+1 

Since g(y) = h + hzi, we obtain the continued fraction expansion (3.3) 
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4. Two-sided conventional probabilities and denoising 


In the previous section we established that Q is a Bowen-Gibbs measure. The notion of 
a Gibbs measure originates in Statistical Mechanics, and is not equivalent to the Bowen- 
Gibbs dehnition. In Statistical Mechanics, one is interested in two-sided conditional 
probabilities 

Q{yo\yZln,yi) or Q{yo\y<o,y>0) := Q{yo\yzlo,yr)- 
The method of section 2 can be used to evaluate continual probabilities Q(i/oll/dm) l/i); 
m, n > 0 for y = (..., y_i, yo, yi ,...) G {-1,1}^. Indeed, 

QiyzLyo^yi) 


Qiyo\y-m:yi) = 


Q{y-L yo, yi) + Q{y-L yo, yi) ’ 


where yo = —yo. We can evaluate 


n—1 


r,..., y-i, yo, yi,..., yn) = exp [J ^ XiXi+i + xty, 

J,K l|n+m+l i=—m i=—m 


C.J 


\ n+m+l 
‘^J,K 


'^—m,n{y-m). 


by hrst summing over spins on the right: Xn,... ,Xi, and then summing over spins on the 
left: X-m, ■ ■ ■, X-i. One has 

-1 0 


exp^J ^2 Xi^i+i + ^ ^^Xiyi + ) j exp 1 

i=—m i=m 

-1 


G)^ 


. 2=1 


exp B{Wj I 2cosh(t(;g exp I B{w, 




\j=—m 


. 2=1 


where now = Ky^rn, 

= Kyj+i + j = -m ,..., -2. 


and 

Therefore, 

Q{yo\yz]n,yi) = 


= Kyo + 

^—m,n{y—m, l/O) l/l ) 


^-m,n{y-L yo, yi) + ^-m,n{y-L yo, yi) 

cosh^Kyo + A{w^ZZr^^) + 

cosh^Kyo + A{w2^'') + + cosh(—i^i/o -|- A{w2"^'') + 

Again, given this expression, one easily establishes uniform convergence and existence of 
the limits, 

Q{yo\yzl„yT) = Q{yo\yzln,yi)- 

m^n^oo 
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Thus the two sided conditional probabilities are also regular, c.f. Theorem 3.1. 


4.1. Denoising. Reconstruction of signals corrupted by noise during the transmission is 
one of the classical problems in Information Theory. Suppose we observe a sequence {un}, 
n = 1,..., iV, given by (1.1), i.e., 

Vn ‘ 

where {xn} is some unknown realisation of the Markov chain, and {zn} is unknown real¬ 
isation of the Bernoulli sequence {Zn}. The natural question is, given the observed data 
= (i/i,..., Un), what is the optimal choice of Xn = ~ the estimate of Xn, such 

that the empirical zero-one loss (bit error rate) 

1 ^ 

n=l 

is minimal. The corresponding standard maximum a posteriori probability (MAP) estima¬ 
tor (denoiser) is given by 

= X^{y^) = argmax P[X„ = x\Y^ = y{^], n = l,...,N. 

xe{-i,i} 


In case, parameters of the Markov chain (i.e., P) and of the channel (i.e., 11) are known, 
conditional probabilities P[X„ = x \ = y^] can be found using the backward-forward 

algorithm. Namely, one has 


an{x)/3, 

where 

an{x) = ¥[Yf = yl,Xn = x\, {in{x) = = y^n^^\Xn = x\ 

are the so-called forward and backward variables, satisfying simple recurrence relations: 



an+i{x) = '^Onix) Px,x '^x,yn+i, n = l,...,N -I, with ai{x) = P(Xi = 

xGA 

f3n{x) = ^/3n+l(T) Px,x '^x,y„+i, n=l,...,N -I, with /3 v(t) = 1. 

x&A 


The key observation of [10] is that the probability distribution P[X„ = • \Y^ = y^], viewed 
as a column vector, can be expressed in terms of two-sided conditional probabilities 
Q[Idi = ■ with N \ n = [1,..., N} \ {n}, as follows 


(4.2) 


P[X„ = -\Y^ = y^] 


Tiy^ © n-iQ[F„ = ■ \Y^> = y^>] 
(7r^„ 0 n-lQ[F„ = • \YXn = yN\n^^ 1 ) ’ 


where 11 is the emission matrix, and 7r_i,7ri are the columns of 11; 


n 


1 — e e 


1 — e 


e 


, TT-i = 


> TTi = 


e 1 — e 


e 


1 — e 


n-i 


1 1 — e —e 

l-2e -e 1 - e 
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and © is componentwise product of vectors of equal lengths, 


uQv = {ui ■ vi,... ,Ud- Vd). 


Expression (4.2) opens a possibility of constructing denoisers when parameters of the un¬ 
derlying Markov chains are unknown; we continue to assume that the channel remains 
known. Indeed, two-sided conditional probabilities Q[En = ■ = ^Y\n] 

mated from the data. The Discrete Universal Denoiser (DUDE) [10] algorithm estimates 
conditional probabilities 


(4.3) 


Q(>;^ 


n I — n~^ 'yn+kN 

^ I ^ n—kff fcjv’ ^ n+l 





where m{a_l^,c,b\’^) is the number of occurrences of the word a_l.^cbi’^ in the observed 
sequence = {yi,... ,yN)', the length of right and left contexts is set to /cat = clogiV, 
c > 0. DUDE has shown excellent performance in a number of test cases. In particular, 
in caseof the binary memoryless channel and the symmetric Markov chain, considered 
in this paper, performance in comparable to the one of the backward-forward algorithm 
(4.1), which requires full knowledge of the source distribution, while DUDE is completely 
oblivious in that respect. In our opinion, excellent performance of DUDE in this case is 
partially due to the fact that Q is a Gibbs measure, admitting smooth two-sided conditional 
probabilities, which are well approximated by (4.3) and thus can be estimated from the 
data. It will be interesting to evaluate performance in cases when the output measure is 
not Gibbs. 

Invention of DUDE sparked a great interest in two-sided approaches to information- 
theoretic problems. It turns out that despite the fact the efficient algorithms for estimation 
of one-sided models exist, the analogous two-sided problem is substantially more difficult. 
As alternatives to (4.3), other methods to estimate two-sided conditional probabilities have 
been suggested , e.g., [5,8,11]. For example, Yu and Verdii [11] proposed a Backward- 
Forward Product (BFP) model: 


Qiyo\y<o:y>o) oc Qiyo\y<o)Qiyo\y>o): 


and the one-sided conditional probabilities Q(|/o|l/<o), Q(l/o|l/>o) can be estimated using 
standard one-sided algorithms. Note, that in our model. 


Q{yo\y<o)Q{yo\y>o) 


Q{yo\y<o)Q{yo\y>o) + Q{yo\y<o)Q{yo\y>o) 

cosh{Kyo + A(t(;_i)) cosh.{Kyo + ^(tci)) 


cosh.{Kyo + A{w-i)) cosh.{Kyo + A{wi)) + cosh.{—Kyo + A{w-i)) cosh.{—Kyo + ^(tci)) 
in general does not coincide with 

cosh[Kyo + A{w-i) + ^(tci)) 


cosh(A'|/o + ^(ic-i ) + A(wi)) -|- cosh(—iPi/o -|- A(w_i ) + 4l(wi)) 


= Q{yo\y<o,y>o)- 


Nevertheless, the BFP model seems to perform extremely well [11]. 
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Among other alternatives, let us mention the possibility to extend standard one-sided 
algorithms to produce algorithms for estimating two-sided conditional probabilities from 
data. This approach is investigated in [2], where the densoising performance of the resulting 
Gibbsian models is evaluated. Gibbsian algorithm performs better than DUDE: bit error 
rates are given in the table below for noise level e = 0.2 and various values of p (smaller 
rates are better). 


p 

Gibbs 

DUDE 

0.05 

5.30% 

5.58% 

0.10 

9.91% 

10.48% 

0.15 

13.20% 

13.77% 

0.20 

18.34% 

18.77% 


One could also try to estimate the Gibbsian potential directly, e.g., using the estimation 
procedure proposed in [4]. This method showed promising performance in experiments 
on language classihcation and authorship attribution. In conclusion, let us also mention 
that the direct two-sided Gibbs modeling of stochastic processes opens possibilities for 
applying semi-parametric statistical procedures, as opposed to the universal (parameter 
free) approach of DUDE. 
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